Extraction and Visualization of Temporal Information and Related Named Entities from Wikipedia

نویسنده

  • Daryl Woodward
چکیده

This paper addresses our process in generating a tool that extracts named entities and events from a document and visualizes them in ways beneficial to someone learning about the topic. The ultimate goal is to present a user with many of the key events and their associated people, places, and organizations within a document that will quickly give users an idea of the contents of an article. For testing, we use a set of historical Wikipedia articles which focus on topics such as the American Civil War. These articles have high occurrences of all types of named entities along with many events with clearly defined time spans. For initial named entity extraction, we incorporate the Stanford NLP CRF into our project. In recognizing location names in this subject area, it only achieves an f-measure of 57.2%. The list of locations is geocoded through Google Geocoder and will be disambiguated through a tree structure in the future. A final f-measure of 79.1% is determined which represents the precision and accuracy of our package in successfully grounding the extracted locations. The grounded locations are then grouped with other named entities related to an event through sentencelevel association. Visualization is currently done through Google Maps and the Timeline SIMILE project developed at MIT. We plan to add the capability to geospatially and temporally refine article searches in Wikipedia and make our tool usable on other online corpora.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

Towards Temporal Scoping of Relational Facts based on Wikipedia Data

Most previous work in information extraction from text has focused on named-entity recognition, entity linking, and relation extraction. Less attention has been paid given to extracting the temporal scope for relations between named entities; for example, the relation president-Of(John F. Kennedy, USA) is true only in the time-frame (January 20, 1961 November 22, 1963). In this paper we present...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Automatically Extending NE coverage of Arabic WordNet using Wikipedia

This paper focuses on the automatic extraction of Arabic Named Entities (NEs) from the Arabic Wikipedia (AWP), their automatic attachment to Arabic WordNet (AWN) and their automatic link to Princeton's English WordNet (PWN). We briefly report on the current status of AWN, focusing on its rather limited NE coverage. Our proposal of automatic extension is then presented, applied and evaluated. Ke...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010